Human Genetics
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Human Genetics's content profile, based on 25 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Najarzadeh Torbati, P.; Hallbrucker, L.; Hofrichter, M. A. H.; Owrang, D.; Setzke, J.; Kilimann, M. W.; Hemmatpour, A.; Rajati, M.; Ghayoor Karimiani, E.; Haaf, T.; Vogl, C.; Vona, B.
Show abstract
Hereditary hearing loss is highly genetically heterogeneous, with emerging overlap between genes implicated in early-onset and age-related hearing loss. We report a consanguineous family with autosomal recessive, non-syndromic hearing loss in which the proband harbors a homozygous splice-site variant in PALM3 (NM_001145028.2:c.314+1G>A) and a homozygous missense variant in OTOA. A minigene assay for the PALM3 variant demonstrated aberrant splicing with exon skipping, resulting in a frameshift and a large inframe deletion, both consistent with loss of function and impacting all known transcripts. While the organ of Corti from 12-month-old heterozygous Palm3 mice showed preserved overall architecture, published Palm3 knockout mice exhibit auditory dysfunction, supporting an auditory phenotype with loss of function. Although a dual molecular diagnosis cannot be excluded, the combined genetic, functional, and comparative data support PALM3 as a strong candidate gene for autosomal recessive hearing loss.
Martin, A.; Llanes-Cuesta, M. A.; Hartley, J. N.; Frosk, P.; Drogemoller, B. I.; Wright, G. E. B.
Show abstract
IntroductionNeuromuscular disorders (NMDs) encompass a broad group of conditions that primarily affect the peripheral nervous system. They are often caused by genetic alterations that impair skeletal muscle function and result in debilitating symptoms. Obtaining an accurate molecular diagnosis remains a challenge, potentially because variants in genes that have yet to be identified as causal. We therefore used advanced computational methods to study the genetic architecture of NMDs and to identify key features that distinguish NMD genes from other genes in the broader genome. MethodsCurated genes implicated in NMDs (n = 639; GeneTable of NMDs) were obtained and merged with a comprehensive set of genomic features for human autosomal protein-coding genes. Machine-learning-based feature selection and ranking were performed using Boruta, along with complementary analytical approaches. These analyses were used to identify the most important genic features (n = 134, subcategories: gene complexity, genetic variation, expression patterns, and other general gene traits) for discriminating NMD genes from other genes in the genome ResultsNMD genes exhibit enriched expression in disease-relevant tissues, including skeletal muscle and heart. Additionally, compared with other protein-coding genes, these genes exhibit increased transcriptomic complexity (e.g., longer transcripts and more unique isoforms), contain more short tandem repeats, and show greater variation in conservation across model organisms. ConclusionsThis study identified several key genomic features that may distinguish NMD genes from the rest of the genome. This may enhance the identification of novel causal genes and could ultimately facilitate earlier diagnosis and medical management for affected individuals.
Litster, T. M.; Wilcox, R. A.; Carroll, R.; Gardner, A. E.; Nazri, N. M.; Shoubridge, C. A.; Delatycki, M. B.; Lohmann, K.; Agzarian, M.; Turella Divani, R.; Rafehi, H.; Scott, L.; Monahan, G.; Lamont, P. J.; Ashton, C.; Laing, N. G.; Ravenscroft, G.; Bahlo, M.; Haan, E.; Lockhart, P. J.; Friend, K. L.; Corbett, M. A.; Gecz, J.
Show abstract
The spinocerebellar ataxias (SCAs) are a clinically heterogenous group of neurodegenerative disorders that affect movement, vision, speech and balance. Here, we reassign the linkage of SCA30 to 14q32.13 based on a cumulative LOD score >12. Within this interval we identified a 331 kb duplication, absent in population controls and not observed in >800 unrelated individuals with genetically unresolved cerebellar ataxia. RNASeq analysis of patient-derived lymphoblastoid cell lines revealed a splice-mediated chimeric transcript resulting from the duplication event. This transcript joined exon 1 of CLMN to exon 2 of SYNE3. In silico translation predicted that this chimeric transcript would produce a short N-terminal peptide corresponding to exon 1 of CLMN and the usually untranslated region of exon 2 of SYNE3 fused to the complete and in-frame SYNE3 protein. Transient overexpression of SYNE3 or the CLMN::SYNE3 fusion protein, in both HeLa cells and mouse primary cortical neurons, resulted in equivalent cellular outcomes including altered nuclear morphology and chromosomal DNA fragmentation. SYNE3 forms part of the linker of nucleoskeleton and cytoskeleton complex and is not usually expressed in cerebellar Purkyn[e] neurons while, CLMN has a Purkyn[e] specific expression pattern within the brain. Our data suggests that ectopic expression of SYNE3 in cerebellar Purkyn[e] neurons, mediated by the CLMN promoter, leads to cerebellar atrophy and causes spinocerebellar ataxia in the SCA30 family. This is an example of Mendelian disease arising from a novel, chimeric transcript with a likely dominant negative effect. Chimeric transcripts are commonly associated with cancers, but they are not often associated with monogenic disorders. Detection of chimeric transcripts as part of structural variant analysis could increase the genetic diagnostic yield of Mendelian disorders.
Nabunje, R.; Guillen-Guio, B.; Hernandez-Beeftink, T.; Joof, E.; Leavy, O. C.; International IPF Genetics Consortium, ; Maher, T. M.; Molyneux, P.; Noth, I.; Urrutia, A.; Aburto, M.; Flores, C.; Jenkins, R. G.; Wain, L. V.; Allen, R. J.
Show abstract
Genome-wide association studies of idiopathic pulmonary fibrosis (IPF) have identified 35 common genetic risk loci associated with IPF susceptibility. In this study, we evaluated the effects of the reported variants in clinically curated non-European individuals. Despite limited sample sizes, we observed partial replication, limited transferability of some variants and evidence of ancestry-specific effects. The MUC5B promoter variant rs35705950 emerged as the dominant and most consistent signal across ancestries. Our findings highlight the need for larger, well-characterised studies in understudied populations to support robust discovery and translation.
Kancheva, I. K.; Voigt, S.; Munting, L.; van Dis, V.; Koemans, E.; van Osch, M. J. P.; Wermer, M. J. H.; Hirschler, L.; van Walderveen, M.; Weerd, L. v. d.
Show abstract
A prominent radiological manifestation of cerebral amyloid angiopathy (CAA) is enlargement of perivascular spaces (EPVS), which is suggested to result from fluid stagnation due to impaired perivascular clearance. Here, we report a novel observation of hypointense rims in cerebral white matter surrounding EPVS near haemorrhages on in vivo 7T Gradient Echo MRI. We hypothesised that the observed black rim pattern denotes iron accumulation that may be caused by incomplete clearance following bleeding. We investigated the occurrence and localisation of this marker on in vivo and ex vivo MRI and examined its histopathological correlates. From MRI data of the prospective longitudinal natural history study of hereditary Dutch-type CAA (D-CAA) at Leiden University Medical Centre, we selected the first 20 consecutive patients who underwent 7T imaging and assessed the presence of black rims on MRI. Post-mortem material was available from one donor with black rims on in vivo scans. Formalin-fixed coronal brain slabs were scanned at 7T MRI, including a high-resolution T2*-weighted sequence. Guided by ex vivo MRI, tissue blocks from representative areas with black rims were sampled for histopathological analysis. Serial sections were stained for iron, calcium, myelin, and general tissue morphology. On in vivo 7T MRI, 9 out of 20 participants exhibited one or several black rims, all located close to a haemorrhage. In the D-CAA donor, ex vivo MRI signal loss matched the in vivo contrast changes. Thirty-six vessels with ex vivo-observed black rims were retrieved and histopathologically examined, showing iron accumulation surrounding perivascular spaces, but the pattern and severity of iron deposition varied. Across groups, vessels displayed microvascular degeneration, including hyaline vessel wall thickening, adventitial fibrosis, and perivascular inflammation. We identified black rims on in vivo 7T MRI and confirmed their correspondence on ex vivo imaging. Iron deposition was determined as the underlying correlate of black rims, but the histopathology appears heterogeneous. The preferential deposition of iron around EPVS may indicate incomplete clearance of iron-positive blood-breakdown products after bleeding. The varied pattern of iron accumulation and microvascular alterations may reflect different pathophysiological mechanisms related to the formation and maintenance of black rims in D-CAA.
Ren, J.; VA Million Veteran Program, ; Liu, C.; Hui, Q.; Rahafrooz, M.; Kosik, N. M.; Urak, K.; Moser, J.; Muralidhar, S.; Pereira, A.; Cho, K.; Gaziano, J. M.; Wilson, P. W. F.; Million Veteran Program, V.; Phillips, L. S.; Sun, Y.; Joseph, J.
Show abstract
Background: Heart failure (HF) is a major and growing public health problem, and prior studies support a meaningful genetic contribution to HF susceptibility. Clinically, HF is commonly categorized into the major clinical sub-types of HF with reduced ejection fraction (HFrEF) and HF with preserved ejection fraction (HFpEF), which differ in pathophysiology and clinical profiles. However, previous genome-wide association studies have focused on autosomal variation and have routinely excluded the X chromosome, leaving X-linked genetic contributions to HF and its subtypes under-characterized. Methods: We performed X-chromosome wide association study (XWAS) utilizing directly genotyped data from 590,568 Million Veteran Program participants, including 90,694 HF cases across European, African, Hispanic, and Asian Americans. Sex- and ancestry-stratified logistic regression was used with XWAS quality control measures, adjusting for age and population structure, followed by fixed-effects multi-ancestry meta-analysis. Functional annotation, gene-based testing, fine-mapping, and colocalization were performed. We replicated genetic associations with all-cause HF in the UK Biobank. Results: In the multi-ancestry meta-analysis, we identified five X-chromosome-wide significant loci for all-cause HF, five for HFrEF, and one locus for HFpEF in males. No loci reached significance in female-specific analyses. In sex-combined analyses, we identified six loci for all-cause HF and four for HFrEF. The strongest and most emphasized signals mapped to genes were BRWD3, FHL1, and CHRDL1. Ancestry-specific analyses revealed additional loci, including NDP and WDR44 in African ancestry and PHF8 in Hispanic ancestry. One locus, BRWD3, was replicated in UK Biobank HF cohort. Integrated post-GWAS analyses (fine-mapping, colocalization and pleiotropy trait association studies) reinforced the biological plausibility of the X-linked signals. Conclusions: This multi-ancestry, sex-stratified XWAS identifies X-linked genetic contributions to HF and its subtypes and highlights the role of X-chromosome in heart failure pathogenesis.
Wang, Y.; Tuftin, B.; Raffield, L. M.; Hidalgo, B.; Kerns, S. L.; DeWan, A. T.; Leal, S. M.; Auer, P.
Show abstract
Individuals with admixed ancestry comprise a significant proportion of populations of the Americas. Statistical methods have been developed to specifically leverage local ancestry inference to enhance the power and interpretability of genome-wide association studies in admixed populations. However, no such methods currently exist to test for rare-variant aggregate associations. Here we present LANTERN (Leveraging local ANcestry Tracts to Enhance Rare variaNt aggregate associations), a method that infers the alleles that lie on each ancestral haplotype and conducts rare-variant aggregate association testing in a generalized linear mixed model framework. Through simulation studies we demonstrated that LANTERN achieves proper control of Type 1 error while boosting power to detect associations when causal alleles predominately lie on one ancestral haplotype. Using data from a cohort of African American participants from the Jackson Heart Study, LANTERN identified two genes known to be involved in red-blood cell (RBC) biology when local ancestry information was incorporated. Specifically, a burden of rare alleles on European ancestral haplotypes in EPO was associated with both hemoglobin levels (HGB) and RBC counts, whereas a burden of rare alleles on African ancestral haplotypes in EPB42 was associated with HGB and RBC. In summary, LANTERN (i) allows for the identification of ancestry-specific rare-variant associations; and (ii) enhances rare-variant association signals compared to an analysis that ignores local ancestry. LANTERN is implemented in R and is freely available on GitHub.
Zheng, W.; Liu, T.; Xu, L.; Xie, Y.; Jing, Y.; Shao, H.; Zhao, H.
Show abstract
Phenome-wide association studies (PheWAS) enable systematic exploration of relationships between genetic variants and clinical phenotypes derived from electronic health records (EHRs). Conventional regression-based PheWAS treats phenotypes separately and relies on binary phenotype representations, which limits statistical power for rare variants and rare phenotypes and reduces the ability to detect associations with phenotypes that are distributed across clinical codes. To address this limitation, we first developed EmbedPheScan, a phenotype embedding-based prioritization framework that summarizes the phenotypic profiles of rare loss-of-function variant carriers in a continuous embedding space. We then proposed EA-PheWAS by combining these embedding-derived signals with conventional regression-based PheWAS results using the aggregated Cauchy association test. Using the UK Biobank whole-exome sequencing and EHR data, we show that the proposed methods maintain appropriate false-positive control. We then performed genome-wide phenome scans across all genes and across biologically defined gene classes to evaluate EA-PheWAS relative to conventional PheWAS and EmbedPheScan, consistently finding that EA-PheWAS outperformed the other two methods. We illustrate the utility of EA-PheWAS focusing on four genes representing distinct scenarios, including strong-effect disease genes (PKD1, PKD2), genes with large numbers of rare LoF carriers (NF1), and genes with extremely sparse carrier counts (FBN1).
Gunnarsson, C.; Ellegard, R.; Ahsberg, J.; huda, s.; Andersson, J.; Dworeck, C. F.; Glaser, N.; Erlinge, D.; Loghman, H.; Johnston, N.; Mannila, M.; Pagonis, C.; Ravn-Fischer, A.; Rydberg, E.; Welen Schef, K.; Tornvall, P.; Sederholm Lawesson, S.; Swahn, E. E.
Show abstract
Abstract Background Spontaneous coronary artery dissection (SCAD) is a well-recognised cause of acute coronary syndrome particularly among women without conventional cardiovascular risk factors. Increasing evidence indicates a genetic contribution; however, the underlying genetic architecture of SCAD remains insufficiently understood. Objective The aim of this study was to assess the prevalence of rare variants in previously reported SCAD associated genes and to explore the potential presence of novel genetic alterations in well-characterised Swedish patients with SCAD. Methods The study comprised 201 patients enrolled in SweSCAD, a national project examining the clinical characteristics, aetiology, and outcomes of SCAD. All individuals had a confirmed diagnosis based on invasive coronary angiography. Comprehensive exome sequencing was performed to identify rare variants contributing to disease susceptibility. Results Genetic variants that have been associated with SCAD according to current clinical genetics practice for variant reporting were identified in approximately 4 % of patients. In addition, rare potentially relevant variants were detected in almost 60 % of patients in genes associated with vascular integrity and vascular remodelling. Conclusion This study supports SCAD as a genetically complex arteriopathy, driven by rare high?impact variants together with broader polygenic susceptibility. Variants in collagen, vascular extracellular matrix, and oestrogen?responsive pathways provide biologically plausible links to female?predominant disease. Although the diagnostic yield of clearly actionable variants is modest, these findings support broader genomic evaluation beyond overt syndromic presentations and highlight the need for larger integrative genomic and functional studies to refine risk stratification and management.
Tomasi, J.; Xu, H.; Zhang, L.; Carey, C. E.; Schoenberger, M.; Yates, D. P.; Casas, J.
Show abstract
Background: Elevated lipoprotein(a) [Lp(a)] is a known risk factor for several cardiovascular-related diseases established from multiple genetic and observational studies. However, the underlying mechanisms mediating the effects of Lp(a) levels on cardiovascular disease risk and major adverse cardiovascular events (MACE) are unclear. The aim of this study was to identify proteins downstream of Lp(a) using mendelian randomization (MR) - a genetic causal inference approach. Methods: A two-sample MR was performed by initially identifying Lp(a) genetic instruments based on data from genome wide association studies (GWAS) of Lp(a) blood concentrations. These instruments were then tested for association with proteins from proteomic pQTL data (Olink from UK Biobank, 2940 proteins and SomaScan from deCODE, 4907 proteins). Results: A total of 521 proteins associated with Lp(a) were identified. Using pathway enrichment analysis, the following MACE-relevant pathways were identified comprising a total of 91 Lp(a) downstream proteins: oxidized phospholipid-related, chemotaxis of immune cells and endothelial cell activation, pro-inflammatory monocyte activation, neutrophil activity, coagulation, and lipid metabolism. Conclusion: The results suggest that the influence of Lp(a) treatments is primarily through modifying inflammation rather than lipid-lowering, thus providing insight into the mechanistic framework which mediates the effects of elevated Lp(a) on atherosclerotic cardiovascular disease.
Pan, H.; Wang, D.
Show abstract
Abstract Background: Cardiometabolic diseases arise from metabolic dysfunction that develops decades before clinical onset. Conventional genetic risk models are typically derived in middle-aged or older populations, where genetic effects are confounded by cumulative environmental exposures, chronic comorbidities, and clinical interventions. Whether the life stage at which genetic liability is modelled influences the biological signal captured by polygenic scores remains unclear, particularly in underrepresented populations. We therefore tested whether genetic liability modelled in early adulthood, a period of relative physiological stability, is associated with cardiometabolic risk across the life course in Asian populations. Methods: We developed a polygenic score for metabolic syndrome, GenMetS, using data from 1,368 Singaporean women aged 18-45 years. The model integrates 15 established polygenic scores for metabolic traits and applies elastic-net penalized regression to optimize variant weights. GenMetS was evaluated in five cohorts comprising 670,952 individuals aged 0-94 years across population-based and disease-enriched settings, including Asian and European ancestry groups. Associations with metabolic traits, cardiometabolic diseases, multimorbidity, and early-life growth patterns were assessed. Results: In Asian populations, GenMetS explained 5.0-12.4% of the variance in metabolic syndrome in adults and 10.3% in children, with negligible performance in European populations (R squared < 0.001). Higher GenMetS was associated with increased odds of cardiometabolic diseases, including type 2 diabetes, heart failure, and stroke (odds ratios 1.32-1.52 per standard deviation). In UK Biobank participants of Asian ancestry, GenMetS improved discrimination of cardiometabolic multimorbidity beyond age alone. Associations were consistent across sexes. In children, higher GenMetS was associated with obesogenic growth trajectories and increased abdominal adiposity. Conclusions: Genetic liability to metabolic dysfunction modelled in early adulthood captures a stable biological signal associated with metabolic traits, disease risk, and multimorbidity from childhood to adulthood in Asian populations. These findings indicate that the life stage of model derivation shapes the biological signal captured by polygenic scores and support the development of life-stage and ancestry-informed approaches for cardiometabolic risk assessment and prevention.
Vergara, C.; Ni, Z.; Zhong, J.; McKean, D.; Connelly, K. E.; Antwi, S. O.; Arslan, A. A.; Bracci, P. M.; Du, M.; Gallinger, S.; Genkinger, J.; Haiman, C. A.; Hassan, M.; Hung, R. J.; Huff, C.; Kooperberg, C.; Kastrinos, F.; LeMarchand, L.; Lee, W.; Lynch, S. M.; Moore, S. C.; Oberg, A. L.; Park, M. A.; Permuth, J. B.; Risch, H. A.; Scheet, P.; Schwartz, A.; Shu, X.-O.; Stolzenberg-Solomon, R. Z.; Wolpin, B. M.; Zheng, W.; Albanes, D.; Andreotti, G.; Bamlet, W. R.; Beane-Freeman, L.; Berndt, S. I.; Brennan, P.; Buring, J. E.; Cabrera-Castro, N.; Campa, D.; Canzian, F.; Chanock, S. J.; Chen, Y.;
Show abstract
Pancreatic cancer disproportionately affects Black individuals in the United States, but they have limited representation in genetic studies of pancreatic ductal adenocarcinoma (PDAC). To address this gap, we performed admixture mapping and genome-wide association analysis (GWAS) in genetically inferred African ancestry individuals (1,030 cases and 889 controls). Admixture mapping identified three regions with a significantly higher proportion of African ancestry in cases compared to controls (5q33.3, 10p1, 22q12.3). GWAS identified a genome-wide significant association at 5p15.33 (CLPTM1L, rs383009:T>C, T Allele Frequency=0.51, OR:1.45, P value=1.24x10-8), a locus previously associated with PDAC. Known loci at 5p15.33, 7q32.3, 8q24.21 and 7q25.1 also replicated (P value <0.01). Multi-ancestral fine-mapping identified two potential causal SNPs (rs3830069 and rs2735940) at 5p15.33. Collectively these findings identified novel PDAC risk loci and expanded our understanding of this deadly cancer in underrepresented populations, emphasizing the multifactorial nature of PDAC risk including inherited genetic and non-genetic factors. Statement of SignificanceTo understand how genetic variation contributes to PDAC risk in Black people in North American, we studied individuals of genetically-inferred African ancestry. We identified novel risk loci and differences in the contribution of known loci. This demonstrates that ancestry-informed genetic analyses improve our understanding of PDAC risk and enhances discovery.
Chauquet, S.; Jiang, J.-C.; Barker, L. F.; Hunter, Z. L.; Singh, G.; Wray, N. R.; McRae, A. F.; Shah, S.
Show abstract
Drug targets supported by human genetic evidence have significantly higher approval rates, making genome-wide association studies a valuable resource for drug candidate prioritisation. Transcriptome-wide association study signature-matching is an emerging in silico approach that integrates GWAS data with expression quantitative trait loci to generate a disease gene expression signature, which is then compared against drug perturbation databases such as the Connectivity Map. Despite recent adoption, there is no consensus on optimal methodology. Here, we systematically benchmark key parameters, including TWAS method, eQTL tissue model, similarity metric, gene set size, and CMap cell line, using LDL cholesterol, familial combined hyperlipidemia, and asthma as proof-of-concept traits. We demonstrate that while TWAS signature-matching can successfully prioritise known first-line treatments, performance is highly sensitive to parameter choice; for instance, the selection of the cell line used for drug signatures alone can dramatically alter drug prioritisation. Based on these findings, we propose a best-practice framework for robust, genetically-informed drug prioritisation using TWAS signature-matching.
Romero, C.; Wightman, D. P.; Jurgens, S.; van Walree, E.; Corver, M.; Haydarlou, P.; Schipper, M.; Bezzina, C.; Posthuma, D.; van der Sluis, S.
Show abstract
Cardiovascular diseases (CVDs) frequently co-occur, yet the shared genetic basis of cardiovascular multimorbidity remains unclear. We analysed common- and rare-variant genetic overlap across eight major CVDs using genome-wide and exome-wide association data from ~1.7 million individuals in European and East Asian biobanks. Fifteen CVD pairs showed significant genetic correlations, with shared common-variant covariance explaining a modest proportion of phenotypic comorbidity. Genomic structural equation modelling identified three latent genetic clusters, while pleiotropic loci and genes frequently spanned cluster boundaries. Prioritised genes converged on atherosclerosis-related processes, myocardial structural and electrical programmes, and vascular-wall biology. In conditional analyses, body composition and metabolic traits consistently attenuated shared genetic liability, whereas circulating biomarkers showed smaller effects. For adequately powered traits, common-variant architecture was broadly similar between European and East Asian ancestries. These results define a shared genetic framework for cardiovascular multimorbidity centred on systemic risk factors and vascular biology.
Clavere, N. G.; Kim, J. H.; Letcher, K. P.; Molakaseema, S. T.; Silva, K.; Pal, S.; Becker, J. R.
Show abstract
Introduction: Hypertrophic Cardiomyopathy (HCM) is a disease defined by the development of left ventricle hypertrophy. One of the most commonly mutated genes in HCM is cardiac myosin binding protein C (MYBPC3). MYBPC3 protein localizes to the cardiomyocyte sarcomere, but studies have reported detection of both MYBPC3 RNA and protein in non-cardiomyocyte cell populations. Therefore, it was unclear if MYBPC3 expression in non-cardiomyocyte cell populations altered the development of cardiomyopathy caused by MYBPC3 protein deficiency. Methods: We utilized genetically modified murine models with germline deletion of Mybpc3 exons 3 to 5 (Mybpc3-/-) or cardiomyocyte specific deletion of Mybpc3 exons 3 to 5 (Mybpc3fl/fl ; Myh6-Cre). Gene expression was assessed using quantitative RT-PCR. Whole tissue protein levels were assessed using immunoblots. Immunohistochemistry and proximity ligation assays were performed to evaluate in situ protein expression. Echocardiography was utilized to measure left ventricular structure and function. Results: Mybpc3 mRNA was detected in multiple organs including the heart, lung and blood from both humans and mice. Utilizing transgenic murine models with germline or cardiomyocyte specific deletion of Mybpc3 exons 3-5, we discovered that the Mybpc3 mRNA detected in extracardiac locations originated primarily from cardiomyocytes. Likewise, MYBPC3 protein was identified in myocardial tissue but not in other organs and cardiomyocytes were the only cell population in myocardial tissue that had detectable MYBPC3 protein. Importantly, cardiomyocyte deletion of Mybpc3 caused similar pathological myocardial remodeling and alterations in left ventricular function compared to germline deletion of Mybpc3 in all cell populations. Conclusions: Our results show that cardiomyocytes are the primary cell source of Mybpc3 mRNA detected in extracardiac organs and they are the principal cell type responsible for the cardiomyopathy caused by MYBPC3 protein deficiency. These results suggest that selective targeting of cardiomyocytes should be the most efficient approach to treat cardiomyopathies associated with MYBPC3 deficiency.
Ngo, A.; Guindon, S.; Pedergnana, V.
Show abstract
Understanding how genetic variation in pathogens influences clinical phenotypes observed in infected hosts is a fundamental challenge in evolutionary genomics and public health. Phenotypic traits such as infection severity are often non-randomly distributed within the pathogens phylogeny, suggesting the existence of evolutionary determinants but also violating the independence assumption underlying classical genome-wide association studies and potentially leading to inflated false positive rates. We present MutaPhy, a phylogeny-based method aimed at detecting correlations between a binary host phenotype and the corresponding pathogen genome by directly utilizing the hierarchical structure of phylogenetic trees. MutaPhy encompasses three different scales: (i) a subtree scale, on which relevant clades over-representing the phenotype of interest are detected using permutation-based tests; (ii) a tree scale, which agglomerates local signals into a global association statistics; and (iii) a site scale, whereby candidate mutational events on branches leading to significant clades are examined using ancestral sequence reconstruction. We evaluate the statistical behavior and detection performance of MutaPhy using simulations under diverse evolutionary scenarios. We also compare this tool to several existing phylogenetic association methods. As illustrative applications, we apply MutaPhy to dengue virus and hepatitis C virus datasets associated to clinical phenotypes in human hosts. Our results highlight the ability of the proposed approach to detect viral lineages associated to over-represented phenotypes while revealing limited evidence for robust mutation-level associations in these particular datasets. Altogether, MutaPhy provides a framework for guiding genotype-phenotype association analyses by leveraging phylogenetic structure, thereby reducing false positive findings and improving the interpretability of association signals.
Mavura, Y.; Crosslin, D.; Ferar, K. D.; Lawlor, J. M.; Greally, J. M.; Hindorff, L.; Jarvik, G. P.; Kalla, S.; Koenig, B. A.; Kvale, M.; Kwok, P.-Y.; Norton, M.; Plon, S. E.; Powell, B. C.; Slavotinek, A.; Thompson, M. L.; Popejoy, A. B.; Kenny, E. E.; Risch, N.
Show abstract
PurposeDiagnostic yield from exome and genome sequencing varies widely across studies. It remains unclear how much of this variation reflects patient-level factors (e.g., sex, clinical features, race/ethnicity, genetic ancestry) versus site-level practices such as sequencing modality or variant interpretation workflows. We aimed to quantify the contributions of these factors to diagnostic outcomes across five U.S. clinical sequencing sites. MethodsWe performed a cross-sectional analysis of 3,008 prenatal, neonatal, and pediatric cases from the NHGRI Clinical Sequencing Evidence-Generating Research (CSER) consortium (2017-2023). Clinical indications spanned neurodevelopmental, neurological, immunological, metabolic, craniofacial, skeletal, cardiac, prenatal, and oncologic presentations. Genetic ancestry was inferred from sequencing data, and variants were interpreted using ACMG/AMP guidelines to classify DNA-based diagnoses. Generalized linear mixed models were used to estimate associations between diagnostic yield and fixed effects (sex, prenatal status, isolated cancer, number of clinical indications, sequencing modality, race/ethnicity, and genetic ancestry), while modeling study site as a random effect to quantify between-site variation. ResultsThe overall diagnostic yield was 19.0%. Multiple clinical indications (OR=1.47, 95% CI 1.20-1.80, p<0.001) were associated with higher diagnostic yield, and male sex (OR=0.80, 95% CI 0.66-0.96, p=0.017) and prenatal status (OR=0.63, 95% CI 0.44-0.90, p=0.012) were associated with lower yield. Sequencing modality, race/ethnicity, genetic ancestry, and isolated cancer were not statistically significantly associated with diagnostic outcomes.. A model without fixed effects attributed [~]10% of variance in diagnostic yield to between-site differences. After adjusting for covariates, site-level variance decreased to 5.7%, indicating consistent variation across sites not explained by measured patient factors. ConclusionAcross five sites, patient-level clinical features influenced diagnostic yield, but substantial site-level variation remained even after adjustment. Differences in variant interpretation, or case-classification practices may contribute to this residual variability. Further efforts to increase consistency in exome- and genome-sequencing diagnostic workflows may help reduce inter-site differences.
Sakaue, S.; Yang, D.; Zhang, H.; Posner, D.; Rodriguez, Z.; Love, Z.; Cui, J.; Budu-Aggrey, A.; Ho, Y.-L.; Costa, L.; Monach, P.; Huang, S.; Ishigaki, K.; Melley, C.; Tanukonda, V.; Sangar, R.; Maripuri, M.; Sweet, S. M.; Panickan, V.; McDermott, G.; Hanberg, J. S.; Riley, T.; Laufer, V.; Okada, Y.; Scott, I.; Bridges, S. L.; Baker, J.; VA Million Veteran Program, ; Wilson, P. W.; Gaziano, J. M.; Hong, C.; Verma, A.; Cho, K.; Huffman, J. E.; Cai, T.; Raychaudhuri, S.; Liao, K. P.
Show abstract
Rheumatoid arthritis (RA) is a heritable and common autoimmune condition. To date, most genetic associations were derived from individuals with either European or East Asian ancestries. Here, we applied a multimodal automated phenotyping strategy to define RA and performed a genome-wide association study (GWAS) of RA in the Million Veteran Program (MVP), including underrepresented African American (AFR) and Admixed American (AMR) populations. Meta-analyses with previous RA cohorts identified 152 autosomal genome-wide significant loci, of which 31 were novel. Inclusion of multi-ancestry data dramatically improved fine-mapping resolution. Functional characterization of these loci using single-cell transcriptomic and chromatin data suggested new RA genes such as CHD7 and CD247. We identified underappreciated functional roles of fine-grained immune cell states other than T cells, such as B cell and myeloid cell states. We observed that multi-ancestry polygenic risk scores using our data demonstrated better predictive ability, especially for AFR and AMR populations.
Leduc, A.; Bachr, A.; Sandron, F.; Delepine, M.; Delafoy, D.; Fund, C.; Daviaud, C.; Meslage, S.; Turon, V.; Bacq-Daian, D.; Rousseau, F.; Olaso, R.; Deleuze, J.-F.; Gerber, Z.; Meyer, V.
Show abstract
Background: Short read sequencing technologies have dominated the field of human whole genome sequencing in the past years in terms of cost, throughput, and accuracy. However, thanks to recent technological evolution, long read approaches have become increasingly competitive and complementary to short reads. With the gap in the cost per genome closing slowly between both approaches, long reads might replace short read sequencing in future research and clinical applications. Still, comprehensive evaluation is necessary to conclude on the performance and general advantages of each technology. Results: In this study, we compared the latest chemistries of major suppliers of short and long read technologies: Illumina short reads, Illumina Complete Long Reads (ICLR), Pacific Biosciences HiFi reads (PacBio), and Oxford Nanopore Technologies long reads (ONT). Using the HG002 human reference sample and established bioinformatics guidelines, we assessed their variant calling performance against the latest available truth sets at different levels of coverage. For single nucleotide variant detection, all technologies were equivalent. Despite the latest improvements in chemistry, indel calling with ONT continues to lag in accuracy behind other technologies. In contrast, long reads delivered a clear advantage in structural variant detection, surpassing short reads in both accuracy and sensitivity. The hybrid ICLR approach achieved intermediate performance, narrowing the gap between short and long read sequencing. Furthermore, long reads enhanced haplotype-phasing resolution, enabling the phasing of over 80% of the genome. Conclusions: These findings highlight the specific strengths and limitations of recent sequencing technologies, aiding the decision-making in future research projects, technological platforms development, and clinical applications.
Hou, K.; Pazokitoroudi, A.; Strober, B.; Jiang, X.; Price, A. L.
Show abstract
Proteome-wide association studies (PWAS) typically link genetically predicted protein levels to disease using cis-pQTLs, which can be limited by low cis-heritability for disease-critical genes under negative selection and by tagging due to co-regulation among nearby genes. Trans-pQTLs provide complementary information when large sample sizes are available to detect weak polygenic effects, enabling associations between trans-predicted protein levels and disease. We developed PolyPWAS, a functionally informed, summary statistics-based framework for associating both cis- and trans-predicted protein levels to disease. PolyPWAS integrates 96 functional annotations with proteome-wide pleiotropy to improve protein prediction, while correcting for PCs of predicted protein levels to limit tagging effects. We applied PolyPWAS to 2.8K plasma proteins measured in 34K UKB-PPP participants, analyzing GWAS summary statistics for 88 diseases and complex traits (average N=336K). Trans-predicted protein levels explained 21% of disease heritability (vs. 9.6% for cis-predicted protein levels), leveraging a 24% relative improvement in trans-prediction accuracy from functional priors. Trans-PWAS identified more significant protein-disease associations (and more conditionally significant associations) than cis-PWAS. Cis and trans associations showed only modest excess overlap (1.18, 95% CI: 1.11-1.26). Accordingly, combining evidence from cis and trans associations improved disease gene prioritization evaluated using gene sets from rare variant association studies (+11% relative improvement) and PoPS (+7.0% relative improvement) relative to cis-only approaches. PWAS associations to disease replicated across protein level cohorts, with strong UKB-PPP/deCODE concordance after adjusting for cohort-specific prediction accuracy. We provide examples where trans-regulatory effects link multiple disease-critical genes, underscoring the importance of integrating cis- and trans-regulatory effects to map protein-mediated disease biology.